Discrete Fourier analysis for phylogenetic trees

نویسنده

  • MARTIN PH. STEHNO
چکیده

Discrete Fourier transformations (DFTs) provide a useful tool to assign a phylogenetic tree (PGT) to an observed frequency of nucleotide patterns in DNA sequences of species. The advantage of this sort of spectral analysis is that it allows global correction for multi-substitution processes [1]. SPECTRAL ANALYSIS OF PGTS Two spectras characterize a PGT, the probability spectrum p(T), and the expected sequence spectrum s(T). After labelling the edges of the tree in an appropriate way, they are can be related by two steps of transforms using vector functions called Hadamard conjugations. The intermediate vector is called the edge length spectrum. The transformation scheme is given in Fig. 1. This scheme can be used in two ways. Starting with a probability distribution we can calculate the edge length spectrum and the expected sequence spectrum. On the other hand, given a data set D, we can take the observed sequence spectrum s(D) (the relative frequencies of character patterns) as an estimate for s(T). From this we calculate a conjugate spectrum γ(D) (the ‘corrected partition frequencies’) [1, 4]. This will correct for all parallel, multiple, and higher order substitutions. We find the corresponding tree, that is the tree for which | γ(D) – q(T) | is minimal, using a fitting algorithm (e.g. least-squares best fit or ‘closest tree algorithm’). Having found the correct tree one is able to reconstruct the probability spectrum and expected sequence spectrum. HADAMARD CONJUGATION A conjugation consists of three transformations that are successively applied. The third transformation is the inverse of the first. The m ‰ m – Hardamard matrix Ht is defined as (Hendy et al., Proc. Natl. Acad. Sci. USA 91 (1994)) Fig.1. Scheme of transformations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel e...

متن کامل

Fourier Analysis and Phylogenetic Trees

We give an overview of phylogenetic invariants: a technique for reconstructing evolutionary family trees from DNA sequence data. This method is useful in practice and is based on a number of simple ideas from elementary group theory, probability, linear algebra, and commutative algebra.

متن کامل

Fourier analysis on finite Abelian groups: some graphical applications

A survey of basic techniques of Fourier analysis on a finite Abelian group Q with subsequent applications in graph theory. In particular, evaluations of the Tutte polynomial of a graph G in terms of cosets of the Q-flows (or dually Q-tensions) of G. Other applications to spanning trees of Cayley graphs and group-valued models on phylogenetic trees are also used to illustrate methods.

متن کامل

Quantitative Comparison of Tree Pairs Resulted from Gene and Protein Phylogenetic Trees for Sulfite Reductase Flavoprotein Alpha-Component and 5S rRNA and Taxonomic Trees in Selected Bacterial Species

Introduction: FAD is the cofactor of FAD-FR protein family. Sulfite reductase flavoprotein alpha-component is one of the main enzymes of this family. Based on applications of this enzyme in biotechnology and industry, it was chosen as the subject of evolutionary studies in 19 specific species. Method: Gene and protein sequences of sulfite reductase flavoprotein alpha-component, 5S rRNA sequence...

متن کامل

Generating functions for multi-labeled trees

Multi-labeled trees are a generalization of phylogenetic trees that are used, for example, in the study of gene versus species evolution and as the basis for phylogenetic network construction. Unlike phylogenetic trees, in a leaf-multi-labeled tree it is possible to label more than one leaf by the same element of the underlying label set. In this paper we derive formulae for generating function...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001